Green Shift: Denmark's Energy Path¶
02806 Social Data Analysis and Visualization
Authors:
- Diogo Carvalho, s233176
- Matteo Piccagnoni, s232713
- Davide Giuffrida, s233204
How to read this notebook¶
The notebook is structured into key sections: Imports and Variables, 1. Motivation, 2. Basic stats, 3. Data analysis, 4. Genre, 5. Visualizations, 6. Discussion, 7. Contributions, and 8. References with the latter divided as specified in the final project overview. To address questions, we clearly repeat the question statement, include any relevant code, and then provide a concise answer, ensuring our analysis is both clear and navigable.
An example of a question structure is as follows:
- Question or statement regarding the task
# here is code if any in an appropriate codeblock
# there can be multiple of these to ensure readability
Answer/Comments to the question or statement This can also have titles using ## Title to better split the different topics
This approach ensures clarity and makes it easier to navigate through our analysis, providing a direct linkage between the questions posed and our responses.
Imports and variables¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn.linear_model import LinearRegression
from urllib.request import urlopen
import json
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, CustomJS
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, CustomJS, Slider
from bokeh.layouts import column
from bokeh.models import HoverTool
plt.style.use("fivethirtyeight")
plt.rcParams["font.family"] = "serif"
plt.rcParams["font.serif"] = "Ubuntu"
plt.rcParams["font.monospace"] = "Ubuntu Mono"
plt.rcParams["font.size"] = 10
plt.rcParams["axes.labelsize"] = 10
plt.rcParams["axes.labelweight"] = "bold"
plt.rcParams["xtick.labelsize"] = 8
plt.rcParams["ytick.labelsize"] = 8
plt.rcParams["legend.fontsize"] = 10
plt.rcParams["figure.titlesize"] = 12
# Renders the plot within the Jupyter notebook
output_notebook()
1. Motivation¶
- What is your dataset?
The dataset is an aggregation of data from various reputable entities including Eurostat, Statistics Denmark, Our World in Data, The World Air Quality Index Project and OECD. It integrates diverse sets of information spanning renewable energy adoption, economic impacts, policy responses, and sustainability efforts across Denmark and other countries. This multifaceted approach allows for comprehensive analyses and correlations among different topics, as the data is not confined to a single repository but is collected from multiple sources, each contributing unique insights.
- Why did you choose this/these particular dataset(s)?
As foreign students in Denmark, gaining a deeper understanding of how the country navigates its socio-economic and environmental challenges is crucial for us. We chose datasets from Eurostat, Statistics Denmark, Our World in Data, The World Air Quality Index Project and OECD because they provide a reliable and comprehensive view not just of Denmark’s environmental policies, but also of its economic impacts and broader sustainability efforts. These datasets allow for a nuanced analysis of Denmark's strategies in renewable energy adoption alongside socio-economic indicators, offering a more holistic view of how these strategies align with or impact other sectors. This choice enables us to engage more deeply with the country’s policy landscape and informs our perspective on living and studying in an environment that prioritizes sustainability and innovation.
- What was your goal for the end user's experience?
Our goal for the end user's experience is to create an engaging, smooth, and informative website with interactive graphs that effectively communicate our analysis of Denmark's green energy and sustainability efforts. We aim to provide users with a user-friendly interface that allows them to explore data trends, compare Denmark's performance with other European countries, understand the economic and environmental impacts of renewable energy adoption, and interact with the visualization to have a more personalized experience. We want to empower users to gain insights into Denmark's pioneering strategies for sustainable living and inspire action towards a greener future.
2. Basic stats¶
- Write about your choices in data cleaning and preprocessing?
The data come from different and reliable sources: Our World in Data, Statistic Denmark, Eurostat, The World Air Quality Index Project and OECD.
In the data preprocessing, we integrated multiple tables to meet the requirements of the project. We filtered the data specifically for European countries, with a particular focus on Denmark. This selective approach was taken to align with our analysis and narrative, which primarily revolves around Denmark and its comparison with other European nations. The data were additional filtered based on year, starting from 2000 onwards. Certainly, this value varies from plot to plot as it is higly dependant on the available data and purpose of the plot. We will now detail some of the most relevant data cleaning procedures we have performed. You can see the precise code in later sections.
For the analysis of Denmark's GDP throughout the years we opted to show a trend line which aids the viewer in properly understanding whether this value has been really increasing or decreasing. We created this line directly from code by fitting a Linear Regression model to the data and visualizing the obtained line.
In processing the air quality table, we grouped the data by month to derive monthly averages for air quality, facilitating the creation of a line chart with a trend line. This was created in the same fashion as detailed in the previous paragraph. Furthermore, to address missing values (NaN), we used interpolation, substituting the missing values with averages from the preceding and succeeding months. Additionally, we observed a gap in the data spanning approximately five months (from June 2017 to November 2017). To solve this, we performed further interpolation for these months to ensure accurate values and achieve a smoother graph. While this procedure eliminated a small portion of data variability, it still aided in our purpose as the timespan was relatively insignifcant and the visualisation was much enhanced.
For a detailed overview of our data processing and cleaning procedures, please refer to Section 3 below.
- Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis.
Our exploratory data analysis has highlighted several critical insights, particularly focusing on Denmark's environmental and energy usage trends. Here are the top three findings based on their importance and impact:
- Section 3.1 -> significant reduction in emissions: Both Denmark and Europe have seen significant declines in greenhouse gas emissions since 2000, with Denmark's reductions being particularly pronounced due to effective environmental policies.
- Section 3.2 -> rapid increase in renewable energy: Since 2005, there has been a substantial increase in renewable energy usage across Europe, with Denmark consistently surpassing the EU average, showcasing its commitment to sustainable energy solutions.
- Section 3.8 -> progressive decline in CO2 emissions: Denmark has experienced a notable decrease in CO2 emissions per capita, especially from 2010, indicating the success of its aggressive environmental strategies compared to other European nations.
- Section 3.4 -> shift towards renewables: Denmark has significantly increased its consumption of renewable sources like solar and wind, while decreasing its reliance on coal and oil, aligning with its sustainability goals.
- Section 3.7 -> improving air quality: The trend analysis of PM10 values suggests a general improvement in Denmark’s air quality from 2014 to 2024, with a clear decreasing trend in PM10 concentrations indicating effective air quality management strategies.
These findings underscore the effectiveness of Denmark's environmental strategies and its leadership in adopting sustainable practices, setting a benchmark for global efforts in environmental stewardship and renewable energy adoption.
3. Data analysis¶
- Describe your data analysis and explain what you've learned about the dataset. If relevant, talk about your machine-learning.
As mentioned in the previous section one of the most interesting way we have enhanced some of our visualizations was the creation of the trend line. This line was created by fitting a Linear Regression model to the data and plotting it by extrapolating its coefficients. This process was relatively straightforward and it provides us with a clear discussion point about the visualization.
Apart from the models detailed above, we have not utilized other machine learning algorithms as we did not deem them useful for our visualizations. This was also due to the fact that we have not worked with a unified dataset, but a combination of tables found from different websites. These were already quite specific so no further data analysis was required. We instead decided to focus primarily on the combination of said tables and the quality of the plots themselves.
3.1 Greenhouse gas emissions by country¶
In this subsection, we will explore the greenhouse gas emissions data from different countries, focusing specifically on data from the year 2000 onwards and targeting European nations. Our analysis will begin by examining the structure and key characteristics of the dataset provided by Our World in Data.
We will conduct a detailed exploration of the data, involving filtering the dataset to concentrate on European countries, creating a pivot table to facilitate easier comparison across years, and finally, exporting the refined data for further use and analysis.
Import¶
df = pd.read_csv("./global_emissions_total.csv")
Understand the data¶
# shape
df.shape
(37195, 4)
# columns
df.columns
Index(['Entity', 'Code', 'Year',
'Annual greenhouse gas emissions in CO₂ equivalents'],
dtype='object')
# summary statistics
df.describe()
| Year | Annual greenhouse gas emissions in CO₂ equivalents | |
|---|---|---|
| count | 37195.000000 | 3.719500e+04 |
| mean | 1936.000000 | 4.812280e+08 |
| std | 49.940635 | 2.347735e+09 |
| min | 1850.000000 | -4.547971e+06 |
| 25% | 1893.000000 | 1.890863e+06 |
| 50% | 1936.000000 | 1.469790e+07 |
| 75% | 1979.000000 | 7.788846e+07 |
| max | 2022.000000 | 5.385116e+10 |
Cleaning and Filtering¶
# filter by year >= 2000
df_from_2000 = df[df["Year"] >= 2000].reset_index(drop=True)
# filter by the european countries only
european_countries = [
"Albania",
"Andorra",
"Armenia",
"Austria",
"Azerbaijan",
"Belarus",
"Belgium",
"Bosnia and Herzegovina",
"Bulgaria",
"Croatia",
"Cyprus",
"Czech Republic",
"Denmark",
"Estonia",
"Finland",
"France",
"Germany",
"Georgia",
"Greece",
"Hungary",
"Iceland",
"Ireland",
"Italy",
"Latvia",
"Liechtenstein",
"Lithuania",
"Luxembourg",
"Malta",
"Moldova",
"Monaco",
"Montenegro",
"Netherlands",
"North Macedonia",
"Norway",
"Poland",
"Portugal",
"Romania",
"San Marino",
"Serbia",
"Slovakia",
"Slovenia",
"Spain",
"Sweden",
"Switzerland",
"Turkey",
"Ukraine",
"United Kingdom",
"Vatican City"
]
df_europe_from_2000 = df_from_2000[df_from_2000["Entity"].isin(european_countries)].reset_index(drop=True)
populations = {
}
df_europe_from_2000.head()
| Entity | Code | Year | Annual greenhouse gas emissions in CO₂ equivalents | |
|---|---|---|---|---|
| 0 | Albania | ALB | 2000 | 6823153.5 |
| 1 | Albania | ALB | 2001 | 6972113.0 |
| 2 | Albania | ALB | 2002 | 7476225.0 |
| 3 | Albania | ALB | 2003 | 8287149.0 |
| 4 | Albania | ALB | 2004 | 7693972.5 |
Pivot¶
df_europe_pivot_from_2000 = df_europe_from_2000.pivot(
index='Entity',
columns='Year',
values='Annual greenhouse gas emissions in CO₂ equivalents',
)
df_europe_pivot_from_2000 = df_europe_pivot_from_2000.reset_index()
df_europe_pivot_from_2000.head()
| Year | Entity | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | ... | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Albania | 6823153.5 | 6972113.0 | 7476225.00 | 8287149.0 | 7693972.5 | 9062968.0 | 8137700.00 | 8959703.0 | 9551439.0 | ... | 9441338.00 | 10226878.0 | 9166668.0 | 8819070.00 | 9530445.0 | 8901102.0 | 8549947.0 | 8438481.0 | 8077409.00 | 8034590.50 |
| 1 | Andorra | 679078.7 | 680339.9 | 689972.25 | 695759.0 | 722838.4 | 739627.1 | 704623.06 | 694276.6 | 688835.2 | ... | 604878.44 | 587470.0 | 589425.9 | 592965.06 | 589737.4 | 618268.0 | 606122.0 | 492249.9 | 484959.84 | 482078.66 |
| 2 | Armenia | 6319167.5 | 6965931.0 | 6958924.50 | 7462185.5 | 8549841.0 | 10663547.0 | 9743804.00 | 9629689.0 | 9854290.0 | ... | 9318625.00 | 9268561.0 | 9244813.0 | 9451610.00 | 9246954.0 | 9339961.0 | 9628735.0 | 10088171.0 | 10339451.00 | 9655818.00 |
| 3 | Austria | 75275880.0 | 79152790.0 | 80792370.00 | 87597544.0 | 86467150.0 | 87583040.0 | 85321200.00 | 82976936.0 | 82721770.0 | ... | 75951000.00 | 71922130.0 | 73632300.0 | 74552110.00 | 76979870.0 | 73716000.0 | 75020190.0 | 69069670.0 | 73148264.00 | 68350050.00 |
| 4 | Azerbaijan | 43875056.0 | 42624900.0 | 42571516.00 | 44554864.0 | 46609700.0 | 49713012.0 | 51107704.00 | 51204708.0 | 56980080.0 | ... | 57198050.00 | 58598628.0 | 60215556.0 | 61457564.00 | 59594444.0 | 60587220.0 | 66128540.0 | 65383680.0 | 69963430.00 | 70603160.00 |
5 rows × 24 columns
Per capita¶
# Get all the population of each country so we have per capita
url = "https://restcountries.com/v3.1/all"
response = urlopen(url)
data = response.read().decode("utf-8")
# add population to the dataframe
countries = json.loads(data)
countries_df = pd.json_normalize(countries)
countries_df = countries_df[["name.common", "population"]]
# rename the columns
countries_df.columns = ["Entity", "Population"]
# merge the dataframes
df_europe_pivot_from_2000 = pd.merge(df_europe_pivot_from_2000, countries_df, on="Entity")
# divide each column by the population except the Entity
df_europe_pivot_from_2000.iloc[:, 1:] = df_europe_pivot_from_2000.iloc[:, 1:].div(df_europe_pivot_from_2000["Population"], axis=0)
# remove last column
df_europe_pivot_from_2000 = df_europe_pivot_from_2000.iloc[:, :-1]
Export¶
df_europe_pivot_from_2000.to_csv("./final tables/global_emissions_europe_2000.csv", index=False)
Analysis¶
- Avg. annual greenhouse gas emissions in Europe - Line chart
# Denmark
denmark_emissions = df_europe_pivot_from_2000[df_europe_pivot_from_2000['Entity'] == 'Denmark'].drop('Entity', axis=1).T
# Calculate the average emissions per year across all European countries
average_emissions = df_europe_pivot_from_2000.drop('Entity', axis=1).mean()
fig, ax = plt.subplots(dpi=400)
average_emissions.plot(ax=ax, linewidth=2)
denmark_emissions.plot(ax=ax, linewidth=2, label='Denmark')
ax.set_title("Avg. annual greenhouse gas emissions in Europe")
ax.set_xlabel("Year")
ax.set_ylabel("Greenhouse gas emissions (in CO₂ equivalents)")
plt.legend(["Avg. Europe", "Denmark"])
plt.show()
Key insights:
- Both Denmark and Europe show a decline in emissions since 2000, with Denmark having a more pronounced and consistent reduction.
- Denmark consistently reports lower emissions than the European average, highlighting its effective environmental policies.
- Denmark's strategies in reducing emissions, likely involving robust sustainability policies, could serve as a model for other European countries to achieve similar results.
- Continuing current trends, Denmark could set a benchmark in sustainability, while other European countries might need to intensify their efforts to meet international climate goals.
- Choropleth map to understand each country individually
df_long = df_europe_from_2000.melt(
id_vars=["Entity", "Year"],
value_vars="Annual greenhouse gas emissions in CO₂ equivalents",
var_name="Metric",
value_name="Value",
)
# load europe Geo
with urlopen(
"https://raw.githubusercontent.com/leakyMirror/map-of-europe/master/GeoJSON/europe.geojson"
) as response:
europe_geo = json.load(response)
# Assuming df_long is already created and formatted correctly
fig = px.choropleth_mapbox(
df_long,
locations="Entity",
color="Value",
animation_frame="Year",
hover_name="Entity",
range_color=[df_long["Value"].min(), df_long["Value"].max()],
mapbox_style="carto-positron",
color_continuous_scale="Viridis",
geojson=europe_geo,
featureidkey="properties.NAME",
center={"lat": 51.1657, "lon": 10.4515},
zoom=2,
labels={"Value": "CO₂"},
)
fig.update_layout(
title={
"text": "Annual greenhouse gas emissions CO₂ equivalents",
"y": 0.97,
"x": 0.5,
"xanchor": "center",
"yanchor": "top",
},
)
# Show the figure
# fig.show()
fig.show(renderer='notebook')